covariance parameter
Spatial Covariance Constraints for Gaussian Mixture Models
Lu, Hanzhang, Malott, Keiran, Bitra, Venkat Suprabath, Milligan, Kirsty, Subedi, Sanjeena, Cassol, Edana, Chauhan, Vinita, McNairn, Connor, Muir, Bryan, Pasricha, Prarthana, Murugkar, Sangeeta, Thomson, Rowan, Jirasek, Andrew, Andrews, Jeffrey L.
Although extensive research exists in spatial modeling, few studies have addressed finite mixture model-based clustering methods for spatial data. Finite mixture models, especially Gaussian mixture models, particularly suffer from high dimensionality due to the number of free covariance parameters. This study introduces a spatial covariance constraint for Gaussian mixture models that requires only four free parameters for each component, independent of dimensionality. Using a coordinate system, the spatially constrained Gaussian mixture model enables clustering of multi-way spatial data and inference of spatial patterns. The parameter estimation is conducted by combining the expectation-maximization (EM) algorithm with the generalized least squares (GLS) estimator. Simulation studies and applications to Raman spectroscopy data are provided to demonstrate the proposed model.
MCMC for Variationally Sparse Gaussian Processes
James Hensman, Alexander G. Matthews, Maurizio Filippone, Zoubin Ghahramani
Gaussian process (GP) models form a core part of probabilistic machine learning. Considerable research effort has been made into attacking three issues with GP models: how to compute efficiently when the number of data is large; how to approximate the posterior when the likelihood is not Gaussian and how to estimate covariance function parameter posteriors. This paper simultaneously addresses these, using a variational approximation to the posterior which is sparse in support of the function but otherwise free-form. The result is a Hybrid Monte-Carlo sampling scheme which allows for a non-Gaussian approximation over the function values and covariance parameters simultaneously, with efficient computations based on inducing-point sparse GPs. Code to replicate each experiment in this paper is available at github.com/sparseMCMC .
Multidimensional Distributional Neural Network Output Demonstrated in Super-Resolution of Surface Wind Speed
Goldwyn, Harrison J., Krock, Mitchell, Rudi, Johann, Getter, Daniel, Bessac, Julie
Accurate quantification of uncertainty in neural network predictions remains a central challenge for scientific applications involving high-dimensional, correlated data. While existing methods capture either aleatoric or epistemic uncertainty, few offer closed-form, multidimensional distributions that preserve spatial correlation while remaining computationally tractable. In this work, we present a framework for training neural networks with a multidimensional Gaussian loss, generating closed-form predictive distributions over outputs with non-identically distributed and heteroscedastic structure. Our approach captures aleatoric uncertainty by iteratively estimating the means and covariance matrices, and is demonstrated on a super-resolution example. We leverage a Fourier representation of the covariance matrix to stabilize network training and preserve spatial correlation. We introduce a novel regularization strategy -- referred to as information sharing -- that interpolates between image-specific and global covariance estimates, enabling convergence of the super-resolution downscaling network trained on image-specific distributional loss functions. This framework allows for efficient sampling, explicit correlation modeling, and extensions to more complex distribution families all without disrupting prediction performance. We demonstrate the method on a surface wind speed downscaling task and discuss its broader applicability to uncertainty-aware prediction in scientific models.
Variational Deep Learning via Implicit Regularization
Wenger, Jonathan, Coker, Beau, Marusic, Juraj, Cunningham, John P.
Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deploying deep learning models out-of-distribution, in sequential decision-making tasks, or in safety-critical domains, necessitates reliable uncertainty quantification, not just a point estimate. The machinery of modern approximate inference -- Bayesian deep learning -- should answer the need for uncertainty quantification, but its effectiveness has been challenged by our inability to define useful explicit inductive biases through priors, as well as the associated computational burden. Instead, in this work we demonstrate, both theoretically and empirically, how to regularize a variational deep network implicitly via the optimization procedure, just as for standard deep learning. We fully characterize the inductive bias of (stochastic) gradient descent in the case of an overparametrized linear model as generalized variational inference and demonstrate the importance of the choice of parametrization. Finally, we show empirically that our approach achieves strong in- and out-of-distribution performance without tuning of additional hyperparameters and with minimal time and memory overhead over standard deep learning.
An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data
Rambelli, Filippo, Sigrist, Fabio
Gaussian processes (GPs) are flexible, probabilistic, non-parametric models widely employed in various fields such as spatial statistics, time series analysis, and machine learning. A drawback of Gaussian processes is their computational cost having $\mathcal{O}(N^3)$ time and $\mathcal{O}(N^2)$ memory complexity which makes them prohibitive for large datasets. Numerous approximation techniques have been proposed to address this limitation. In this work, we systematically compare the accuracy of different Gaussian process approximations concerning marginal likelihood evaluation, parameter estimation, and prediction taking into account the time required to achieve a certain accuracy. We analyze this trade-off between accuracy and runtime on multiple simulated and large-scale real-world datasets and find that Vecchia approximations consistently emerge as the most accurate in almost all experiments. However, for certain real-world data sets, low-rank inducing point-based methods, i.e., full-scale and modified predictive process approximations, can provide more accurate predictive distributions for extrapolation.
MCMC for Variationally Sparse Gaussian Processes
Gaussian process (GP) models form a core part of probabilistic machine learning. Considerable research effort has been made into attacking three issues with GP models: how to compute efficiently when the number of data is large; how to approximate the posterior when the likelihood is not Gaussian and how to estimate covariance function parameter posteriors. This paper simultaneously addresses these, using a variational approximation to the posterior which is sparse in support of the function but otherwise free-form. The result is a Hybrid Monte-Carlo sampling scheme which allows for a non-Gaussian approximation over the function values and covariance parameters simultaneously, with efficient computations based on inducing-point sparse GPs. Code to replicate each experiment in this paper is available at github.com/sparseMCMC.
Fast covariance parameter estimation of spatial Gaussian process models using neural networks
Gerber, Florian, Nychka, Douglas W.
Gaussian processes (GPs) are a popular model for spatially referenced data and allow descriptive statements, predictions at new locations, and simulation of new fields. Often a few parameters are sufficient to parameterize the covariance function, and maximum likelihood (ML) methods can be used to estimate these parameters from data. ML methods, however, are computationally demanding. For example, in the case of local likelihood estimation, even fitting covariance models on modest size windows can overwhelm typical computational resources for data analysis. This limitation motivates the idea of using neural network (NN) methods to approximate ML estimates. We train NNs to take moderate size spatial fields or variograms as input and return the range and noise-to-signal covariance parameters. Once trained, the NNs provide estimates with a similar accuracy compared to ML estimation and at a speedup by a factor of 100 or more. Although we focus on a specific covariance estimation problem motivated by a climate science application, this work can be easily extended to other, more complex, spatial problems and provides a proof-of-concept for this use of machine learning in computational statistics.
Gaussian Process Boosting
In this article, we propose a novel way to combine boosting with Gaussian process and mixed effects models. Boosting [Freund and Schapire, 1996, Breiman, 1998, Friedman et al., 2000, Mason et al., 2000, Friedman, 2001, Bühlmann and Hothorn, 2007] is a machine learning technique that achieves superior predictive performance for a large variety of datasets [Chen and Guestrin, 2016, Nielsen, 2016]. Apart from this, the wide adoption of treeboosting in applied machine learning and data science is due to several advantages: boosting with trees as base learners can automatically account for complex non-linearities, discontinuities, and high-order interactions, it is robust to outliers in and multicollinearity among predictor variables, it is scale-invariant to monotone transformations of the predictor variables, it can handle missing values in predictor variables automatically by loosing almost no information [Elith et al., 2008], and boosting can perform variable selection. Gaussian processes [Williams and Rasmussen, 2006], on the other hand, are flexible nonparametric function models that achieve state-of-the-art predictive accuracy and allow for making probabilistic predictions [Gneiting et al., 2007]. Gaussian process and mixed effects models are used, for instance, for nonparametric regression, modeling of time series [Shumway and Stoffer, 2017], spatial [Banerjee et al., 2014], spatiotemporal [Cressie and Wikle, 2015], panel or longitudinal, and hierarchically clustered or grouped
Knot Selection in Sparse Gaussian Processes with a Variational Objective
Garton, Nathaniel, Niemi, Jarad, Carriquiry, Alicia
Sparse, knot-based Gaussian processes have enjoyed considerable success as scalable approximations to full Gaussian processes. Certain sparse models can be derived through specific variational approximations to the true posterior, and knots can be selected to minimize the Kullback-Leibler divergence between the approximate and true posterior. While this has been a successful approach, simultaneous optimization of knots can be slow due to the number of parameters being optimized. Furthermore, there have been few proposed methods for selecting the number of knots, and no experimental results exist in the literature. We propose using a one-at-a-time knot selection algorithm based on Bayesian optimization to select the number and locations of knots. We showcase the competitive performance of this method relative to simultaneous optimization of knots on three benchmark data sets, but at a fraction of the computational cost.
Knot Selection in Sparse Gaussian Processes
Garton, Nathaniel, Niemi, Jarad, Carriquiry, Alicia
Knot-based, sparse Gaussian processes have enjoyed considerable success as scalable approximations to full Gaussian processes. Problems can occur, however, when knot selection is done by optimizing the marginal likelihood. For example, the marginal likelihood surface is highly multimodal, which can cause suboptimal knot placement where some knots serve practically no function. This is especially a problem when many more knots are used than are necessary, resulting in extra computational cost for little to no gains in accuracy. We propose a one-at-a-time knot selection algorithm to select both the number and placement of knots. Our algorithm uses Bayesian optimization to efficiently propose knots that are likely to be good and largely avoids the pathologies encountered when using the marginal likelihood as the objective function. We provide empirical results showing improved accuracy and speed over the current standard approaches.